We study a general matrix optimization problem with a fixed-rank positive semidefinite (PSD) constraint. We perform the Burer-Monteiro factorization and consider a particular Riemannian quotient geometry in a search space that has a total space equipped with the Euclidean metric. When the original objective f satisfies standard restricted strong convexity and smoothness properties, we characterize the global landscape of the factorized objective under the Riemannian quotient geometry. We show the entire search space can be divided into three regions: (R1) the region near the target parameter of interest, where the factorized objective is geodesically strongly convex and smooth; (R2) the region containing neighborhoods of all strict saddle points; (R3) the remaining regions, where the factorized objective has a large gradient. To our best knowledge, this is the first global landscape analysis of the Burer-Monteiro factorized objective under the Riemannian quotient geometry. Our results provide a fully geometric explanation for the superior performance of vanilla gradient descent under the Burer-Monteiro factorization. When f satisfies a weaker restricted strict convexity property, we show there exists a neighborhood near local minimizers such that the factorized objective is geodesically convex. To prove our results we provide a comprehensive landscape analysis of a matrix factorization problem with a least squares objective, which serves as a critical bridge. Our conclusions are also based on a result of independent interest stating that the geodesic ball centered at Y with a radius 1/3 of the least singular value of Y is a geodesically convex set under the Riemannian quotient geometry, which as a corollary, also implies a quantitative bound of the convexity radius in the Bures-Wasserstein space. The convexity radius obtained is sharp up to constants.
translated by 谷歌翻译
我们研究了张量张量的回归,其中的目标是将张量的响应与张量协变量与塔克等级参数张量/矩阵连接起来,而没有其内在等级的先验知识。我们提出了Riemannian梯度下降(RGD)和Riemannian Gauss-Newton(RGN)方法,并通过研究等级过度参数化的影响来应对未知等级的挑战。我们通过表明RGD和RGN分别线性地和四边形地收敛到两个等级的统计最佳估计值,从而为一般的张量调节回归提供了第一个收敛保证。我们的理论揭示了一种有趣的现象:Riemannian优化方法自然地适应了过度参数化,而无需修改其实施。我们还为低度多项式框架下的标量调整回归中的统计计算差距提供了第一个严格的证据。我们的理论证明了``统计计算差距的祝福''现象:在张张量的张量回归中,对于三个或更高的张紧器,在张张量的张量回归中,计算所需的样本量与中等级别相匹配的计算量相匹配。在考虑计算可行的估计器时,虽然矩阵设置没有此类好处。这表明中等等级的过度参数化本质上是``在张量调整的样本量三分或更高的样本大小上,三分或更高的样本量。最后,我们进行仿真研究以显示我们提出的方法的优势并证实我们的理论发现。
translated by 谷歌翻译
In this paper, we consider the geometric landscape connection of the widely studied manifold and factorization formulations in low-rank positive semidefinite (PSD) and general matrix optimization. We establish a sandwich relation on the spectrum of Riemannian and Euclidean Hessians at first-order stationary points (FOSPs). As a result of that, we obtain an equivalence on the set of FOSPs, second-order stationary points (SOSPs) and strict saddles between the manifold and the factorization formulations. In addition, we show the sandwich relation can be used to transfer more quantitative geometric properties from one formulation to another. Similarities and differences in the landscape connection under the PSD case and the general case are discussed. To the best of our knowledge, this is the first geometric landscape connection between the manifold and the factorization formulations for handling rank constraints, and it provides a geometric explanation for the similar empirical performance of factorization and manifold approaches in low-rank matrix optimization observed in the literature. In the general low-rank matrix optimization, the landscape connection of two factorization formulations (unregularized and regularized ones) is also provided. By applying these geometric landscape connections, in particular, the sandwich relation, we are able to solve unanswered questions in literature and establish stronger results in the applications on geometric analysis of phase retrieval, well-conditioned low-rank matrix optimization, and the role of regularization in factorization arising from machine learning and signal processing.
translated by 谷歌翻译
In this paper, we consider the estimation of a low Tucker rank tensor from a number of noisy linear measurements. The general problem covers many specific examples arising from applications, including tensor regression, tensor completion, and tensor PCA/SVD. We consider an efficient Riemannian Gauss-Newton (RGN) method for low Tucker rank tensor estimation. Different from the generic (super)linear convergence guarantee of RGN in the literature, we prove the first local quadratic convergence guarantee of RGN for low-rank tensor estimation in the noisy setting under some regularity conditions and provide the corresponding estimation error upper bounds. A deterministic estimation error lower bound, which matches the upper bound, is provided that demonstrates the statistical optimality of RGN. The merit of RGN is illustrated through two machine learning applications: tensor regression and tensor SVD. Finally, we provide the simulation results to corroborate our theoretical findings.
translated by 谷歌翻译
在本文中,我们提出{\ it \下划线{r} ecursive} {\ it \ usef \ undesline {i} mortance} {\ it \ it \ usew supsline {s} ketching} algorithM squares {\ it \下划线{o} ptimization}(risro)。 Risro的关键步骤是递归重要性草图,这是一个基于确定性设计的递归投影的新素描框架,它与文献中的随机素描\ Citep {Mahoney2011 randomized,Woodruff2014sketching}有很大不同。在这个新的素描框架下,可以重新解释文献中的几种现有算法,而Risro比它们具有明显的优势。 Risro易于实现,并在计算上有效,其中每次迭代中的核心过程是解决降低尺寸最小二乘问题的问题。我们在某些轻度条件下建立了Risro的局部二次线性和二次收敛速率。我们还发现了Risro与Riemannian Gauss-Newton算法在固定等级矩阵上的联系。在机器学习和统计数据中的两种应用中,RISRO的有效性得到了证明:低级别矩阵痕量回归和相位检索。仿真研究证明了Risro的出色数值性能。
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
In this paper, a semantic communication framework for image transmission is developed. In the investigated framework, a set of servers cooperatively transmit images to a set of users utilizing semantic communication techniques. To evaluate the performance of studied semantic communication system, a multimodal metric is proposed to measure the correlation between the extracted semantic information and the original image. To meet the ISS requirement of each user, each server must jointly determine the semantic information to be transmitted and the resource blocks (RBs) used for semantic information transmission. We formulate this problem as an optimization problem aiming to minimize each server's transmission latency while reaching the ISS requirement. To solve this problem, a value decomposition based entropy-maximized multi-agent reinforcement learning (RL) is proposed, which enables servers to coordinate for training and execute RB allocation in a distributed manner to approach to a globally optimal performance with less training iterations. Compared to traditional multi-agent RL, the proposed RL improves the valuable action exploration of servers and the probability of finding a globally optimal RB allocation policy based on local observation. Simulation results show that the proposed algorithm can reduce the transmission delay by up to 16.1% compared to traditional multi-agent RL.
translated by 谷歌翻译
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
translated by 谷歌翻译
With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.
translated by 谷歌翻译
Domain adaptive detection aims to improve the generalization of detectors on target domain. To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning. However, they neglect the relationship between multiple granularities and different features in alignment, degrading detection. Addressing this, we introduce a unified multi-granularity alignment (MGA)-based detection framework for domain-invariant feature learning. The key is to encode the dependencies across different granularities including pixel-, instance-, and category-levels simultaneously to align two domains. Specifically, based on pixel-level features, we first develop an omni-scale gated fusion (OSGF) module to aggregate discriminative representations of instances with scale-aware convolutions, leading to robust multi-scale detection. Besides, we introduce multi-granularity discriminators to identify where, either source or target domains, different granularities of samples come from. Note that, MGA not only leverages instance discriminability in different categories but also exploits category consistency between two domains for detection. Furthermore, we present an adaptive exponential moving average (AEMA) strategy that explores model assessments for model update to improve pseudo labels and alleviate local misalignment problem, boosting detection robustness. Extensive experiments on multiple domain adaption scenarios validate the superiority of MGA over other approaches on FCOS and Faster R-CNN detectors. Code will be released at https://github.com/tiankongzhang/MGA.
translated by 谷歌翻译